Diffusion Processes
This week, we begin our second major topic block: diffusion processes. Diffusion refers to the transmission of an entity through a network. This transmission can involve not only the spread of the entity but also resistance to it, its disappearance, or transformation over time.
The entities that diffuse through networks can take many forms. For example, an infectious disease can spread through a population, a wildfire can move through a landscape, memes can travel across social media platforms, and ideas or practices can propagate within a culture. While diseases and ideas might appear quite different, we will see that the underlying diffusion mechanisms are remarkably similar. The distinctions between them can often be modeled through minor changes in our code.
You can download the complete code here.
🎯 Learning Goals
By the end of this chapter, you will be able to:
- Explain the basic mechanics of the SIR and SIS model and its core components.
- Run and analyze multiple replications of stochastic simulations.
- Understand what it means that a diffusion process is super- or subcritial.
SIR Model
Let us begin with the most basic diffusion model: the SIR model. In this model, each individual in the network can be in one of three states:
Susceptible (S): Individuals who have not yet been affected but are at risk of being infected.
Infected (I): Individuals who are currently affected and can transmit the infection to others.
Removed (R): Individuals who are no longer susceptible or infectious, they are immune.
While this model originates from epidemiology, it can be adapted to many social science contexts. Consider the diffusion of innovations, for example:
Susceptible: Individuals who have not yet heard of the innovation but may adopt it in the future.
Infected: Individuals who have adopted the innovation and are actively promoting it—intentionally or unintentionally.
Removed: Individuals who either still use the innovation but no longer promote it, or those who have abandoned it and stopped advocating for it. Crucially, they no longer influence others.
If you’re wondering whether these two interpretations of the Removed state should be treated separately—you are absolutely right! We’ll explore multi-type models soon, but for now we will be confine ourself to these three types.
In this chapter, we will primarily use the language of disease spread. However, keep in mind that the same logic can be applied to other phenomena like cultural ideas, technological adoption, or behavioral norms. I encourage you to consider these alternative interpretations throughout the chapter—it will sharpen your critical thinking about which modeling choices are better suited for different types of diffusion.
Initializing
Let’s begin by setting up our simulation. As before, we first define a model object. For now, it only specifies the size of the world and the seed:
library(tidyverse)
model <- list(
# --- world generation ---
world_size = 21, # creates a world of 21x21 vertices
# --- replication ---
seed = 42 # the initial seed for creating the same world over and over
)Next, we define the init_world() function. Because we use a lattice network, this function closely resembles the one used in the Schelling segregation model:
library(igraph)
init_world <- function(model) {
# For reproduceability we set a seed
set.seed(model$seed)
m <- model$world_size
# 1) generate the lattice network
agents <- make_lattice(
dimvector = c(m, m),
nei = 1
)
# 2) set the type: 1 = Susceptible, 2 = Infected, 3 = Removed
# all agents start susceptible
V(agents)$type <- 1
# but for one agent in the center of the network
center <- length(V(agents)$type) %/% 2 + 1
V(agents)$type[center] <- 2
list(
agents = agents,
nbrs = adjacent_vertices(agents, V(agents))
)
}Here, we save the graph as agents, with each vertex having a type attribute. Since the network remains fixed, we also precompute each agent’s neighbors and store them in nbrs to optimize later computations.
Now, we initialize the world and assign colors to agents based on their type:
world <- init_world(model)
get_type_colors <- function(types) {
colors <- c("#a5d875", # 1 = S
"#eb6a6a", # 2 = I
"#73bcde") # 3 = R
return(colors[types])
}
V(world$agents)$color <- get_type_colors(V(world$agents)$type)We can now visualize the world:
Defining a Step
We now define the run_step() function to simulate a single time step:
run_step <- function(world, model){
# get the agents and the neighbors
agents <- world$agents
nbrs <- world$nbrs
# get their types
types <- V(agents)$type
# copy them to later update them
new_types <- types
# 1) infectious --> removed
# find all agents that are infectious
infected_idx <- which(types == 2)
# this infectious agent now becomes a `Removed`
new_types[infected_idx] <- 3
# 2) loop through all infectious agents
for (v in infected_idx) {
# infect susceptible neighbors
nbrs <- neighbors(g, v)
for (u in nbrs[[v]]) {
if (new_types[u] == 1) {
new_types[u] <- 2
}
}
}
# update agents and world
V(agents)$type <- new_types
world$agents <- agents
return(world)
}Note that first, all currently infectious agents will be removed in the next turn (new_types). Later, however, when we loop through all agents we use the old types (types) since infectious agents even so they will be removed next turn, they can still infect others in the here and now.
To make this more engaging, we’ve embedded an interactive JavaScript version of the simulation:
Adding Probabilities
The simulation above, while functional, is highly deterministic: infected agents always infect all susceptible neighbors and then transition to the removed state. To make this process more realistic, we introduce probabilistic transitions.
We modify our model to include two parameters:
transmit_prob: the probability that an infected agent transmits the infection to a susceptible neighbor.I_R_prob: the probability that an infected agent transitions to the removed state in a given step.
library(tidyverse)
model <- list(
# --- world generation ---
world_size = 21, # creates a world of 21x21 vertices
# --- transition probabilities ---
transmit_prob = 0.5, # infected agents transmit the infection with 50% per contact
I_R_prob = 1.0, # agents become `Removed`with certainty after one time step
# --- replication ---
seed = 42 # the initial seed for creating the same world over and over
)We then modify run_step() accordingly:
run_step <- function(world, model){
# ...
# 1)
I_R_idx <- infected_idx[runif(length(infected_idx)) < model$I_R_prob]
new_types[I_R_idx] <- 3
# 2)
# ...
if (new_types[u] == 1 && runif(1) < model$transmit_prob) {
new_types[u] <- 2
}
# ...
}Show R Code (Completely updated run_step() function)
run_step <- function(world, model){
# get the agents and the neighbors
agents <- world$agents
nbrs <- world$nbrs
# get their types
types <- V(agents)$type
# copy them to later update them
new_types <- types
# 1) infectious --> removed
# find all agents that are infectious
infected_idx <- which(types == 2)
I_R_idx <- infected_idx[runif(length(infected_idx)) < model$I_R_prob]
new_types[I_R_idx] <- 3
# 2) loop through all infectious agents
for (v in infected_idx) {
# infect susceptible neighbors
for (u in nbrs[[v]]) {
if (new_types[u] == 1 && runif(1) < model$transmit_prob) {
new_types[u] <- 2
}
}
}
# update agents and world
V(agents)$type <- new_types
world$agents <- agents
return(world)
}We would expect the first infected agent to infect roughly 2 of its 4 neighbors (given 50% transmission). As the infection spreads, each subsequent agent has a lower chance of infecting others due to overlapping contacts and due to network constraints on the edges and the corners. In such a world, is it likely (say, more often than not) that the network would be completely made up of removed agents when it terminates (i.e. no infected people are left)? We can begin to explore these questions using an updated JavaScript simulation:
You can adjust these parameters while the simulation is running! Try different combinations and observe how the dynamics change.
While the interactive version offers intuitive insights, it doesn’t yet provide systematic answers to key analytical questions like:
What is the average final percentage of removed agents? (Let’s call this metric the final removal rate for now.)
How does this rate vary across combinations of
transmit_probandI_R_prob?How long does it typically take for half the population to be removed, if it happens at all?
The Full R Model
To answer our research questions quantitatively, we return to our R code and build a pipeline that runs multiple simulations and collects summary statistics. We begin by defining a run_sequence() function, which simulates a single diffusion process.
run_sequence <- function(model) {
# 1) Initialization
world <- init_world(model)
# 2) prepare data storage
shares_series <- matrix(NA_real_, nrow = model$max_t, ncol = 3,
dimnames = list(NULL, c("S","I","R")))
# 3) iterate
for (t in seq_len(model$max_t)) {
# Store the data from world
shares_series[t,] <- get_type_shares(world, 3)
# stop if no infectiouse are left
if (shares_series[t, "I"] == 0) {
break
}
# Perform one step
world <- run_step(world, model)
}
# prepare the final world data
df_final_world <- as_data_frame(world$agents, what = "vertices") |>
mutate(id = row_number(), # add id
type = V(world$agents)$type, # add agents type
neighbors = adjacent_vertices(world$agents, V(world$agents)), # add their neighbors
step = t # add how long it took to terminate
)
# prepare the time series data
df_time_series <- shares_series |>
as.tibble() |> # turn from matrix into data frame
filter(!is.na(S)) |> # drop all empty rows
mutate(step = row_number()) # add time
# 3) Return final state and summary
list(
df_final_world = df_final_world,
df_time_series = df_time_series
)
}Let’s take a moment to understand what’s happening here. This function builds directly on what we’ve done before, but now we’re not just simulating—we’re collecting data in a systematic way so we can later analyze the outcomes across many runs.
Here’s the basic flow:
Initialization: We start by generating the world. This is the exact same world setup you’ve already seen.
Preparing Data Storage: Before we run the model, we create a matrix called
shares_seriesthat will store the proportions of Susceptible (S), Infected (I), and Removed (R) agents at every time step. Each row corresponds to one step in time, and each column to one type of agent.Running the Simulation: For up to
max_tsteps, we:Record the current proportions of S, I, and R using the helper function
get_type_shares(). See above for the code.Check if there are any infected agents left. If not, we stop early—why simulate further if nothing’s changing?
Otherwise, we run one step of the simulation.
Capturing the Final State: Once the simulation finishes (either by reaching the time limit or running out of infections), we create a snapshot of the final world. Each agent gets an ID, a state, a list of neighbors, and the total number of steps the simulation took.
Organizing the Time Series: The matrix of proportions is turned into a tidy data frame so that we can easily analyze and visualize how things evolved over time.
All of this is returned in a list, which gives us a complete picture: what happened at each moment, and what the final situation looked like. This structure sets us up beautifully for running multiple simulations and comparing their results.
As mentioned before, to track the proportion of each type, we define a helper function get_type_shares():
get_type_shares <- function(world,
num_of_types = 3) {
# get the types as a vector
types <- V(world$agents)$type
# count how many of each type
counts <- tabulate(types, nbins = num_of_types)
# return proportions S, I, R
counts / length(types)
}This gives us a vector of size 3 with the relative shares of S, I, and R, respectively.
The use of nbins ensures all categories (S, I, R) are represented even if one is absent. For instance compare the following two lines of code:
tabulate(c(1,1,1,2)) # returns only for types 1 and 2[1] 3 1
tabulate(c(1,1,1,2), nbins = 3) # includes a zero for type 3[1] 3 1 0
We now define a simplified version of run_replications(). This version is streamlined for clarity and ease of use in this tutorial. A more comprehensive version—identical to the one used earlier for the Prisoner’s Dilemma simulations—is available in the downloadable course materials.
# functions_simulation.R
run_replications <- function(model,
replications = 10,
master_seed = 42,
ncores = parallel::detectCores() - 1
) {
# 1) prepare the data storage
results <- vector("list", replications)
# 2) run the replication
for (r in seq_len(replications)) {
model$seed <- master_seed + r
results[[r]] <- run_sequence(model)
}
# 3) extract data
get_data_replication(results)
}Once we’ve run all replications, we need to combine and analyze the output. This is the task of get_data_replication():
get_data_replication <- function(results){
# 1) prepare data for the generation of the data frames
df_time_series_rep <- map_dfr(results, `[[`, "df_time_series", .id = "rep")
df_final_worlds_rep <- map_dfr(results, `[[`, "df_final_world", .id = "rep")
replications <- length(results)
# 2) since sequences can have different lengths
# we need to fill in
max_step <- max(df_time_series_rep$step)
df_time_series_rep_filled <- df_time_series_rep %>%
group_by(rep) %>%
complete(step = seq_len(max_step)) %>% # 1,2,…,max_step
arrange(rep, step) %>%
fill(-rep, -step) %>% # fill every other column
ungroup()
# 3) statistics over the course of the simulation
# calculate mean and sd share at each time step
df_time_series <- df_time_series_rep_filled |>
group_by(step) |> # then average over reps
summarise(
S_share_mean = mean(S, na.rm = TRUE),
S_share_sd = if (n() > 1) sd(S, na.rm = TRUE) else 0,
I_share_mean = mean(I, na.rm = TRUE),
I_share_sd = if (n() > 1) sd(I, na.rm = TRUE) else 0,
R_share_mean = mean(R, na.rm = TRUE),
R_share_sd = if (n() > 1) sd(R, na.rm = TRUE) else 0,
replications = replications,
.groups = "drop"
)
# 4) return results
list (
df_time_series = df_time_series,
df_final_worlds = df_final_worlds_rep
)
}As with previous projects, map_dfr() helps us consolidate the individual results into tidy data frames. The final world states can be used directly, but to summarize the time series, we need to account for a small complication: Different simulations may end after different numbers of time steps. One run might end after 30 steps; another might continue until 60. This difference arises because we stop simulations when no infected agents remain. To ensure fair comparison, we fill in the missing steps for shorter runs. Since nothing changes after termination, we simply repeat the last observed values. Here’s how the process works:
complete(step = 1:max_step)makes sure every step is present for every replication. Missing steps are filled withNA.fill(-rep, -step)replaces eachNAwith the most recent value above it, simulating that nothing changes after the infection dies out.
Now we’re ready to run a batch of simulations and inspect the results.
model <- list(
# --- world generation ---
world_size = 21, # creates a world of 21x21 vertices
transmit_prob = 0.5,
I_R_prob = 1.0,
max_t = 200,
# --- replication ---
seed = 42 # the initial seed for creating the same world over and over
)
results <- run_replications(model, replications = 100)Visualizing the Results
Let’s write a plotting function to visualize the average trajectories of the three agent types:
plot_avg_SIR <- function(df) {
# 1) determine number of replicates
if (!"replications" %in% names(df)) {
stop("`df` must contain a `replications` column with the total number of runs.")
}
n_rep <- max(df$replications, na.rm = TRUE)
# 2) pivot longer on *_mean and *_sd, extract series, compute CIs
df_long <- df %>%
pivot_longer(
cols = matches("_(mean|sd)$"),
names_to = c("series","stat"),
names_pattern = "(.+?)_(mean|sd)$",
values_to = "value"
) %>%
pivot_wider(names_from = stat, values_from = value) %>%
# series might be e.g. "S_share", "I_fit", "R_weight_fit" → grab first letter
mutate(
Type = substr(series, 1, 1),
ci_lower = mean - 1.96 * (sd / sqrt(n_rep)),
ci_upper = mean + 1.96 * (sd / sqrt(n_rep))
)
my_cols <- c(S = "#a5d875", # Susceptible
I = "#eb6a6a", # Infected
R = "#73bcde") # Removed
# 3) plot
ggplot(df_long, aes(x = step, y = mean, color = Type, fill = Type, group = Type)) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.2, color = NA) +
geom_line() +
scale_color_manual(values = my_cols) +
scale_fill_manual(values = my_cols) +
labs(
x = "Step",
y = "Proportion"
) +
theme_minimal()
}The downloadable code includes a more flexible version called plot_avg_type(). With it, you can specify which types to include using show (e.g. c("S","I")), stratify by groups using group_var, and optionally pass the number of replications
Let’s plot the result:
print(plot_avg_SIR(results$df_time_series))Finally, let’s check the outcome of the last simulation step:
results$df_time_series |> filter(step == max(step))# A tibble: 1 × 8
step S_share_mean S_share_sd I_share_mean I_share_sd R_share_mean R_share_sd
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 70 0.676 0.241 0 0 0.324 0.241
# ℹ 1 more variable: replications <int>
We observe that, on average, the share of Removed agents does not exceed 33%. This suggests that under the given parameters, the infection never reaches most of the population.
Try to answer the following question: What is the share of simulations that reach a R share of over 50%?
SIS and SIRS Model
So far, we’ve modeled scenarios where infected agents become immune after infection—that is, they transition from Infected to Removed. This is characteristic of many real-world diseases like chickenpox, where recovery often leads to lasting immunity.
However, not all infections grant immunity. Consider the flu: people can catch it again and again, either because their immunity wanes or because the virus mutates. To capture this, we introduce a variation called the SIS model. In this setup, agents can transition from Infected back to Susceptible, reflecting the cyclical nature of some infections. As a result, there are only two states: Susceptible (S) and Infected (I).
Even more realistically, we can also allow for fading immunity: an infected agent might become first immune (R), and then in a second step return to being susceptible (S). This leads us to the SIRS model, which incorporates all three states with flexible transitions depending on probabilities.
To implement this in our model, we add logic inside the run_step() function. Specifically, we place the following code after we’ve handled the transition from Infected to Removed. It targets only those agents who were Removed and transform them, based on the probability R_S_prob, back to Susceptible.
run_step <- function(world, model){
# ...
# 1B) removed --> susceptible
# get all agents that are R
removed_idx <- which(new_types == 3)
# pick those where the random draw is < I_S_prob
R_S_idx <- removed_idx[runif(length(removed_idx)) < model$R_S_prob]
# set them back to S all at once
new_types[R_S_idx] <- 1
# ...
}If R_S_prob=1 each Removed is turned into a Susceptible basically eliminating this type completely from our simulation. In this case we have a SIS model. Otherwise we have a SIRS model.
Here’s the full version of the updated run_step() function for clarity:
Show R Code (Completely updated run_step() function)
run_step <- function(world, model){
# get the agents and the neighbors
agents <- world$agents
nbrs <- world$nbrs
# get their types
types <- V(agents)$type
# copy them to later update them
new_types <- types
# 1A) infectious --> removed
# get all agents that have S == 2, but did not became R
infected_idx <- which(types == 2)
# pick those where the random draw is < I_S_prob
I_R_idx <- infected_idx[runif(length(infected_idx)) < model$I_R_prob]
# set them back to S all at once
new_types[I_R_idx] <- 3
# 1B) removed --> susceptible
# get all agents that are R
removed_idx <- which(new_types == 3)
# pick those where the random draw is < I_S_prob
R_S_idx <- removed_idx[runif(length(removed_idx)) < model$R_S_prob]
# set them back to S all at once
new_types[R_S_idx] <- 1
# 2) loop through all infectious agents
for (v in infected_idx) {
# infect susceptible neighbors
for (u in nbrs[[v]]) {
if (new_types[u] == 1 && runif(1) < model$transmit_prob) {
new_types[u] <- 2
}
}
}
# update agents and world
V(agents)$type <- new_types
world$agents <- agents
return(world)
}To help you explore the SIS (i.e. R_S_prob = 1) dynamics interactively, I’ve prepared a JavaScript version of the simulation. This version uses a larger world and a clustered infection to make the start a bit more stable.
Try the following scenarios to observe different behaviors:
- Transmission rate: 10%, Infectious → Susceptible: 50%
- Transmission rate: 50%, Infectious → Susceptible: 50%
Because agents never transition to a Removed state, the infection is never permanently eliminated. It simply cycles through the population, hopping from node to node. Even a small, finite grid can sustain an SIS infection for a long time.
The Critical Threshold
You may notice something interesting: when the transmission rate is low (say, 10 %), the infection tends to die out. But at higher values (like 50 %), it persists and spreads. This illustrates the concept of the critical threshold.
When a diffusion process spreads and sustains itself indefinitely—essentially infecting a non‐negligible fraction of the network—we say it has gone critical. In everyday language, we might say an idea or behavior has “gone viral” as it spreads unstoppably through the network.
There exists a precise tipping point—the critical threshold—that separates these two regimes:
Subcritical: the infection always dies out in the long run.
Here the effective reproduction number \(R_0\) is strictly less than 1. In this regime every infected node infects, on average, fewer than one new node. A classic result from branching‐process theory then tells us the extinction probability \(q\) satisfies
\[ q = 1, \]
meaning with probability 1 the chain of transmission will eventually fizzle out.Supercritical: the infection may survive indefinitely.
Once \(R_0 > 1\), each infected node infects, on average, more than one new node. In this case the extinction probability \(q\) is strictly less than 1 (but not zero):
\[ 0 < q < 1. \]
Thus even though the process can “take off,” persistence isn’t guaranteed. There remains a chance—albeit smaller—that by sheer bad luck every infected individual fails to transmit before recovering, and the outbreak still dies out. Yet, if q is small enough, such an unlucky (or, depending on the perspective: lucky) event will probably not occur once until the heat death of the universe.
🧩 Conclusion
In this chapter, we took a deep dive into the world of diffusion processes. We started by introducing the classic SIR model, where agents can be Susceptible, Infected, or Removed. We added probabilities to make infection and recovery more realistic. Then, we moved beyond the standard SIR setup. We saw that not all infections lead to permanent immunity. By allowing agents to return from Removed to Susceptible, we introduced the SIS model. Finally, we encountered the idea of a critical threshold: a tipping point beyond which diffusion can persist.
Together, these concepts form a powerful toolkit for understanding how ideas, behaviors, and diseases spread through networks. And we’re just getting started—there’s much more to uncover in future chapters.
Onward!
🏋 Pre-Class Assignment
Before the next session, take some time to engage with the model hands-on.
[Easy]
Determine in the latest JavaScript version, the critical threshold (for a Infectious → Susceptible rate of 90%)[Easy]
Download the code. I have build a R-shiny app that you can run. Play around with it.[Medium]
In the shiny app you will find a slider for the share of initialRemoved. You can think of them as the agents that are already vaccinated at the beginning of an outburst. Yet this is not implemented in theinit_world()find a way to place them there![Medium]
Use your model from task 2 and play around with the parameters and tell us what you learned!
🚀 Link to In-Class Material
Here is the link for the in-class material.
⚙️ Some Advanced Stuff
Finally i designed a flexible verion of get_data_simulation() that should function for any of our simulation. This is good. because then, we do not have to change it individually each time we have a different simulation.
get_data_simulation <- function(results){
# 1) bind all df_process together, and all df_final together
df_names <- results %>%
map(~ keep(.x, is.data.frame)) %>% # for each run, keep only data.frames
map(names) %>% # extract their names
flatten_chr() %>% # flatten into one character vector
unique() # and dedupe
# 2) for each such name, pull it out of every result and bind_rows
out <- map(df_names, function(nm) {
results %>%
map(nm) %>% # extract element nm from each run
keep(is.data.frame) %>% # drop any NULLs or non-df’s
bind_rows() # stack them
})
# 3) Name the list and return
set_names(out, df_names)
}